A novel subject-wise dictionary learning approach using multi-subject fMRI spatial and temporal components

The conventional dictionary learning (DL) algorithms aim to adapt the dictionary/sparse code to individual functional magnetic resonance imaging (fMRI) data. Thus, lacking the capability to consolidate the spatiotemporal diversities offered by other subjects. Considering that subject-wise (sw) data matrix can be decomposed into the sparse linear combination of multi-subject (MS) time courses and MS spatial maps, two new algorithms, sw sequential DL (swsDL) and sw block DL (swbDL), have been proposed. They are based on the novel framework, defined by the mixing model, where base matrices prepared by operating a computationally fast sparse spatiotemporal blind source separation method over multiple subjects are employed to adapt the mixing matrices to sw training data. They solve the optimization models formulated using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_0$$\end{document}l0/\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document}l1-norm penalization/constraints through dictionary/sparse code pair update and alternating minimization approach. They are unique because no existing sparse DL method can incorporate MS spatiotemporal components while updating sw atoms/sparse codes, which can eventually be assembled using neuroscience knowledge to extract group-level dynamics. Various fMRI datasets are used to evaluate and compare the performance of the proposed algorithms with existing state-of-the-art algorithms. Specifically, overall, a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$14\%$$\end{document}14% increase in the mean correlation value and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$39\%$$\end{document}39% reduction in the mean computation time exhibited by swsDL and swbDL, respectively, over the adaptive consistent sequential dictionary algorithm.


Background
The ACSD algorithm 41 had established that instead of relying on the conventional alternating minimization approach as proposed in the KSVD algorithm, updating elements of the dictionary and the corresponding sparse code jointly by solving a penalized rank one error matrix approximation that promotes an adaptive sparse penalty lead to faster convergence and overall higher atom recovery percentage.
For this purpose, consider the fMRI data matrix Y ∈ R N×V constructed from whole-brain BOLD time series that consists of N scans and V voxels.Assuming there is sparseness along the row direction of the fMRI data matrix, then according to the ACSD approach, it can be decomposed into a dictionary matrix D ∈ R N×K whose www.nature.com/scientificreports/columns have normalization constraint and the sparse code matrix X ∈ R K×V whose entries have adaptive sparse penalty expressed as where .F and . 2 is the Frobenius and l 2 norm, respectively, |.| represents the absolute value, hyperparameter k j is the data-driven regularization parameter allocated to each entry of X , N < K signifies overcomplete dictionary, and each atom is normalized to avoid scaling ambiguity.The ACSD algorithm aims to solve Eq. (1) by penalizing the coefficient row in the full error matrix based rank-1 minimization problem to promote the sparsity of x k as where E k = Y − K i=1,i� =k d i x i is the error matrix for all signals from which the k-th atom/sparse code has been removed.The resulting estimate of x k /d k as a pair is given by where T υ (z) = sgn(z) • (|z| − υ1/|z|) + , (z) + , sgn(.) , and • define the component-wise max between (0, z) , the component-wise sign, and the Hadamard product, respectively 59 , 1 is a vector of ones, and k = [ k 1 , . . ., k V ] is obtained using single tuning parameter as 1/|d ⊤ k E k | .Figure 1a describes these updates in form of a flow chart.For fMRI group analysis Eq. ( 1) is modified according to the sgBACES algorithm 50 as where m = {1, . . ., M} , M is the number of subjects, D c /X c stand for the common dictionary/sparse code, and D m /X m represent the subject-specific dictionary/sparse code.The k-th atom/sparse code update for the subject-  (1) 41 , and (b) multi-subject group analysis modeled by Eq. ( 2) 50 .
Vol:.(1234567890) www.nature.com/scientificreports/ The atom/sparse code row in the two equations mentioned above can be solved sequentially as a pair using the ACSD algorithm.Figure 1b gives a flow chart of this routine.

Methods
The DL algorithms, ACSD in particular, have shown superior convergence properties and source recovery precision compared to other BSS methods.Despite this, their applicability is limited to either conventional singlesubject analysis 41 or multi-subject based group analysis 50 as shown in Fig. 1.This paper mainly focuses on extending the DL algorithm to multi-subject based subject-wise analysis to enhance the statistical strength of the single-subject analysis.This expansion would strengthen the accuracy of subject-wise analysis by exploiting hemodynamic variations offered by multiple subjects.It is worth mentioning at this point that sgBACES 50 can also be extended to subject-wise analysis by not combining the common-level residuals and performing DL on each of those residuals individually; however, this approach will raise the computational complexity and remain inapplicable to resting-state data that requires shared spatial maps.

Proposed model
The proposed algorithms secure the swiftly extracted underlying spatial/temporal components from multiple subjects using the ssBSS method and adapt them to subject-wise analysis resulting in increased source recovery performance for each individual.In this regard, consider that each signal in fMRI dataset Y m ∈ R N×V from m- th subject can be represented as a linear combination of a few atoms from subject-wise dictionary D m ∈ R N×K according to the sparse signal strength in each column of the subject-wise coefficient matrix X m ∈ R K×V .How- ever, these subject-wise matrices are constructed using multi-subject atoms/sparse code, which can be accounted for by the multi-subject smooth dictionary D q ∈ R N×MP and the multi-subject sparse code X q ∈ R MP×V treated as base dictionary and base sparse code, respectively.This leads to D m = D q A m and X m = B m X q .The proposed model in its basic form is given as where A m ∈ R MP×K and B m ∈ R K×MP are the representation matrices, a m,k and b k m are the k-th column of A m and k-th row of B m , respectively, K < N < MP < V , and M is the number of subjects.The next subsection describes how to quickly train D q and X q .

Proposed preliminaries
The ssBSS method 49 proposed the following optimization model by considering that dataset Y m can be decom- posed into temporal source matrix T m ∈ R N×P and spatial source matrix S m ∈ R P×V as where T m = T p C m accounts for the smoothness of the BOLD signal by storing DCT bases in T p ∈ R N×K p , and c m,p is the p-th column of the sparse representation matrix C m , P < K p < N . .0 is the l 0 norm that induces spar- sity by counting the number of non-zero elements, S m 1 is the l 1 norm on S m given as K k=1 V j=1 |s k m,j | , and 1 is the sparsity hyperparameter that regulates the coefficient values.To solve Eq. ( 4) efficiently, blind source separation theory is employed that breaks it into the following pair of spatial and temporal source separation problems where the unknowns Q m ∈ R P×K and Z m ∈ R K×P are the mixing matrices, X m,t ∈ R N×K and X m,s ∈ R K×V contain the temporal and spatial features in the reduced dimension, respectively, and 1 / 2 / 3 are the sparsity regularization hyperparameters.For this article, both mixing matrices are assumed non-sparse, and hence their associated sparsity parameters 1 / 2 can be ignored, and the simplified model is given as The feature matrices are obtained using singular value decomposition (SVD), and the unknowns are solved using alternating least squares and soft thresholding via Neumann's alternating projection lemma.

Proposed algorithms swbDL
In order to reinforce the fidelity of subject-wise recovered temporal dynamics, the autocorrelations of each dictionary atom at lag-1 are considered.Instead of directly incorporating the delayed time series 50 , the temporal correlation structure between the current and lagged dictionary is penalized so that for some sufficiently large α , most of the entries that define the difference between the correlation structure α will shrink to zero.Thus, to update representation matrices A m and B m , problem (3) can be modified as follows where D 0 is a time-delayed version of the original dictionary.To solve (7), a computationally efficient block update of representation matrices A m and B m is considered.Hence an alternating optimization approach is adopted where one of the two unknown variables is updated while the other is fixed.For this purpose, A m is fixed then the minimization objective reduces to The update for B in Eq. ( 8) is obtained using soft thresholding and least squares as Next, B m is fixed, D q A m is replaced by D m to solve for it before obtaining an update for A m , and the minimiza- tion function becomes A relaxation variable U is introduced in the above equation, and it is reformulated as This can be solved using the ADMM algorithm that provides the closed-form solution for both D m and U , admits only one tuning parameter and converges for all of its positive values 60 .The augmented Lagrangian for Eq. ( 10) is given as where W is the Lagrangian multiplier and β is the tuning parameter.Due to normalization constraint on diction- ary, all columns of D m and U are normalized during each iteration.Initially setting U and W to zero, the solution to (11) is obtained by computing each of the following until convergence An update for A m is obtained during each iteration of the algorithm as The accompanying algorithm for swbDL is given in Table 1 swsDL In contrast, to block update, the sequential approach is presented in this section for a more precise update of the unknowns.To achieve this, instead of deploying the observed data matrix based decomposition, the rank-1 minimization problem based on the error matrix of all signals is considered where adaptive sparse penalty term is introduced for a fairer assignment of penalty to each entry in b k m x q,j .Accordingly, to update representation matrices A m and B m , problem (3) is reformulated as follows (7) where k 2,j is a data-driven regularization parameter allocated to each entry of b k m X q , and error matrix is The l 0 constraint on representation matrices implements the regularization of dictionary atoms and sparse code through sparse basis expansion where bases have been constructed using components extracted from the ssBSS method.The update for a m,k and b k m is obtained by solving the Lagrangian expression for (14) given as Solving this equation with respect to b k m we obtain Because d m,k = D q a m,k due to the definition, d ⊤ m,k d m,k = 1 due to normalization constraint on dictionary col- umns, and x q,j can be extracted out of the third term as its a constant, then the above equation further unfolds as ( 14) www.nature.com/scientificreports/ This can be further simplified using the soft thresholding approach 59 as which can be rewritten in simplified form as , and 2 is a scalar tuning parameter.By taking into account the l 0 norm imposed on b k m the Eq. ( 17) is reformulated as a constrained problem This can be solved as , where the indices set ϑ can be found using thresholded cor- relation values whose related algorithm is described in 50,61,62 .Solving Eq. ( 16) with respect to a m,k , following solution is obtained By considering the l 0 norm on a m,k the Eq. ( 19) as a constrained problem is given as ε admits only the nonzero entries of the coefficient row through indices set ε 50 leading to a reduced error matrix and reduced compu- tational cost.The swsDL algorithm is described in Table 2.

Proposed framework
Since the proposed model is applied to multiple subjects, group-level analysis was also implemented in addition to the subject-wise inferences.In this context, (i) modeled HRF (MHRs) were produced by using the convolution operation between the task stimuli and canonical HRF from the statistical parametric mapping (SPM) toolbox 63 , and (ii) resting-state network templates (RSNs) (R1-R10) were obtained from Smith 64 .While both MHRs and RSNs were used to accomplish group analysis of task-related data, only RSNs were used for group analysis of resting-state data.The following steps were involved in achieving both subject-wise and group-level analysis 1. Preparing the preliminary bases: Source components T m and S m obtained using the fast ssBSS method were concatenated along the spatial (horizontal) and temporal (vertical) dimension, respectively, to construct base dictionary D q and base sparse code X q as 2. Performing subject-wise analysis: Subject-wise D m /X m were trained using either swbDL or swsDL algorithm as described in the previous section.3. Performing group-level analysis: For this, the following scenarios were considered www.nature.com/scientificreports/(a) For task-related stimuli, the atoms in all subject-wise dictionaries that were most correlated with MHRs were assembled along with their corresponding sparse codes in matrices D r /X r to obtain group-level dynamics through rank-1 decomposition via SVD as follows where r = {1, . . ., R} , R is the number of MHRs, j m (r) represents the indices of the most correlated atom in m-th dictionary with r-th MHR, m = {1, . . ., M} , M is the number of subjects, and D g is the group-level dictionary.(b) For resting-state networks, the subject-wise sparse code rows from all subjects that were most correlated with RSNs were assembled as where j m (r) represents the indices of the most correlated coefficient row in m-th sparse code matrix with r-th RSN, and X g is the group-level sparse code.The proposed framework in form of a block diagram is given in Fig. 2.

Ethical approval
This study's block-design and resting-state fMRI datasets are open-access and shared on the human connectome website https:// www.human conne ctome.org/ study/ hcp-young-adult.The condition for using these datasets is properly acknowledging the funding source and citing relevant publications, which we have in the acknowledgment and experiment sections.Therefore, we do not need any approval from the ethics committee of respective institutes.

Experiments
This section evaluates the proposed algorithms to determine their capability compared to the existing state-ofthe-art data-driven algorithms.For this purpose, data analysis was conducted using three different fMRI datasets, one synthetic and two experimental.The participating algorithms are cgICA, sgICA, CODL, ACSD, ssBSS, swbDL, and swsDL.The main reason for not incorporating sgBACES and ShSSDL algorithms in the comparison study is their inability to extract subject-wise dynamics.The Simtb toolbox 65 was utilized to generate the synthetic fMRI dataset of four subjects.The Human Connectome Project (HCP) 66,67 was availed to acquire eight subject's block design fMRI dataset from its quarter 3 release, and eight subject's resting-state fMRI dataset from its S500 and S900 release.These datasets allowed us to assess the performance of all participating algorithms in terms of their potential to retrieve the ground truth.

Synthetic dataset
In this section, a realistic fMRI dataset of four subjects was generated using the Simtb toolbox.Eighteen distinct temporal sources, each consisting of 300 timepoints with a repetition time (TR = 1 sec ) and twelve distinct spatial sources, each consisting of size 50 × 50 voxels, were used to obtain these four datasets.The source IDs for the spatial components were set to {3, 6, 8, 10, 22, 23, 26, 30, 4, 12, 5, 29} .Overall, nine spatiotemporal sources out of all source signals were used to generate each subject's fMRI data.With some variability across subjects, the first six temporal and six respective spatial sources were present in all subjects; the following two spatial sources were also common, but their temporal patterns were unique to each subject, and the last source's both spatial and temporal features were unique to each subject as shown in Fig. 3.For common temporal sources, the variability across subjects was introduced by varying the HRF parameters, such as delay/dispersion of response/undershoot.Similarly, the intersubject variability for the common spatial maps was established by using parameters of the Gaussian distribution (mean ( µ ) and standard devia- tion (std) ( σ )) that allowed controlling the location, orientation, and spread of the activations.This was realized by random translation in x and y direction ( µ = 0, σ = 1.5 ), random rotation ( µ = 0, σ = 0.9 ), and random scaling ( µ = ρ, σ = 0.05 ) as shown in Fig. 3.Here ρ , the Gaussian distribution's mean, is considered the spread parameter using which the spatial extent of the activations was controlled to create five unique cases of spatial overlaps.The corresponding spatial maps with moderate to substantial dependence are shown in Fig. 4a-e.
The first subject's common and unique spatiotemporal sources and three other unique spatiotemporal sources from the remaining three subjects are assembled and treated as the ground truth TCs and SMs for group-level analysis as shown in Fig. 3 under the heading all source SMs/all source TCs.The other subfigures show the spatial and temporal sources that are used to generate all four datasets where each of the main temporal sources TC 7 and TC 8 consists of four unique temporal patterns.Using a linear mixture model, these sources were utilized to generate each subject's dataset as Y = 8 i=1 (tc i + ψ i )(sm i + φ i ) , where the noise generating matrices ∈ R 300×9 and ∈ R 9×2500 were produced using Gaussian distribution ∼ N (0, σ 2 = n t ) and ∼ N (0, 0.01) , respectively, where n t represents the variance of the temporal noise.Depending on the value of ρ , n t , and trial number, the datasets Y M m=1 (where M = 4 ) were then produced and employed by all algorithms for source retrieval.

Synthetic dataset dictionary learning
The parameter values were kept consistent across all algorithms wherever feasible to produce a fair comparison.
In contrast to experimental fMRI data, the ground truth about the number of source signals is known; therefore, the same number of components as the number of generating sources were trained for the simulated dataset.Since the cgICA/sgICA/CODL were applied to the grouped data, the total number of components to be extracted was set to 12.In comparison, it was set to 9 for ACSD/ssBSS/swbDL/swsDL, which were applied subject-wise.In contrast to CODL, which iterated for 30 iterations, all other dictionary learning algorithms, including ssBSS, were run for 15 iterations.After evaluating different strategies, the optimal dictionary initialization for each algorithm was supplied.Concatenated data, random numbers drawn from the standard normal distribution, and DCT bases were employed for CODL, ssBSS, and ACSD/swbDL/swsDL, respectively.The tuning parameters were handled by experimenting with their various combinations.Those values were considered that produce the best results in terms of similarity between the recovered sources and their respective  www.nature.com/scientificreports/ground truth.Twelve components were kept after each of the two PCA reductions in the case of cgICA/sgICA.The best sparsity and smoothing parameter for sgICA was 3 and 50000, respectively.For a fair comparison with other dictionary learning methods, CODL's batch size was adjusted to b = 2500 , and its temporal reduction was avoided, whereas its sparsity parameter was set to 1.5.For ACSD, the best sparsity parameter was found to be 12.For ssBSS, the tuning parameters were set as 1 = 6 and ζ 1 = 30 , K p = 150 , nine components were obtained from PCA, and nine were retained for iterative routine.For swbDL, the tuning parameters were set as 2 = 8 and α = 1 .For swsDL, the best sparsity parameters were found to be ζ 2 = ζ 3 = 24 and 2 = 16.

Synthetic dataset results
The multi-subject dataset generation and the learning process were repeated several times for different noise realizations to demonstrate the robustness and consistency of the proposed algorithms.To achieve this, the experiment, which included both data generation and learning process, was repeated for (i) two different variance (square of the standard deviation) values of the temporal noise set as n t = {0.3,0.9} , (ii) five values of ρ that were varied from 4 to 8 to gradually increase activation overlaps as shown in Fig. 4a-e, and (iii) 150 different trials.Moreover, the source recovery was performed in regards to both subject-wise and group-wise analysis.Underlying source TCs/SMs were obtained by keeping the indices with the highest correlation values after correlating every algorithm's trained dictionary atoms/sparse code rows with the ground truth TCs/SMs.These correlation values were computed with respect to ground truth SMs and are retained as cTC/sSM.For each of the five spatial overlap scenarios and two noise realizations, the mean of the cTC/cSM values over all nine spatiotemporal sources are saved as mcTC/mcSM, their mean mmcTC/mmcSM over 4 subjects, and the mean of mmcTC/ mmcSM over 150 trials are plotted in Fig. 5A for subject-wise analysis, and in Fig. 5B for group-wise analysis (where the mean of correlation values over the subjects has been excluded).The convergence rate and the progression of correlation values for the proposed algorithms as functions of algorithm iterations are shown in Fig. 6.The component-wise visual comparison among participating algorithms for source recovery is provided in Fig. 7.
From Fig. 5, one can conclude that the swsDL algorithm consistently outperformed all other algorithms for all source recovery scenarios, including spatial/temporal feature, subject-wise/group-level analysis, and spatial overlap cases.It attained the highest recovery performance for low noise levels and spatial dependence.Although this performance dropped as noise intensity and spatial overlaps increased, it remained superior to all other competing algorithms.Its block variant swbDL emerged as a runner-up for the subject-wise analysis, whereas ACSD seems to have replaced its runner-up position for the group-level analysis.Moreover, sgICA has outperformed cgICA by exhibiting lower standard deviation and better recovery precision.The sgICA, for high spatial overlap and noise variance, even surpassed the ACSD algorithm for group-level analysis.It is also noticeable that both proposed algorithms performed relatively superior for subject-wise analysis.
It can be deduced from Fig. 6a that the swbDL algorithm, compared to ssBSS, ACSD and swsDL, converged faster and only needed a few iterations to produce the desired results.This trend also manifests in Fig. 6b, where the correlation strength nearly stopped accumulating for swbDL after the fifth iteration.In contrast, ssBSS, ACSD and swsDL algorithms converged slowly and showed source recovery improvement as the number of iterations increased.
In order to produce component-wise visual comparison, the experiment was repeated for subject-wise analysis using parameter settings as n t = 0.3 and ρ = 5 .For swbDL, the tuning parameter value was changed to α = 0.75 .Due to a lack of space and inferior results from CODL and cgICA, they have been dropped for this particular case.The results were extracted for all four subjects; however, the first nine components were selected from subject 1, and components {10, 11, 12} belonged to subject {2, 3, 4} ninth component.The results of this experiment are   www.nature.com/scientificreports/shown in Fig. 7.It can be depicted from this figure that swsDL defeated all other algorithms in source recovery strength for both spatial and temporal features while swbDL was the second best.The correlation values are written at the bottom of each source, with the best values highlighted in red.

Experimental fMRI dataset preprocessing
The HCP had already preprocessed the resting-state datasets using their preprocessing tools; therefore, they were excluded from our preprocessing routine.On the other hand, block design datasets were preprocessed using the SPM-12 toolbox 63 .The steps involved in the preprocessing of this dataset, such as realignment, normalization, spatial smoothing, and masking, are described in depth in 50,68,69 .Firstly, functional images were realigned to the first image to remove motion artifacts.Secondly, all images were spatially normalized to a Tailarach template, resampled to 2 × 2 × 2 mm 3 voxels, and spatially smoothed using a 6 × 6 × 6 mm 3 full-width at half-maximum (FWHM) Gaussian kernel.Thirdly, the masking step attempts to remove any data outside the scalp.Next, for each subject, the four-dimensional dataset was reshaped and saved as a 2-dimensional matrix called Y m to be considered a whole brain dataset, where m = {1, . . ., 8} .This resulted in the size of each subject's Y matrix being 279 × 236115 for the block design dataset and 400 × 230367 for the resting-state dataset.Temporal filtering was performed on both types of datasets as the next step.This consisted of DCT based high-pass filter to remove low-frequency trends and FWHM based low-pass filter to eliminate high-frequency physiological noise.The cutoff for a DCT filter was set to 1/150 Hz for block design and 1/128 Hz for resting-state datasets, and a cutoff for FWHM was set to 1 sec for both block design and resting-state datasets.After executing the aforementioned steps, all columns of Y were normalized to have zero mean and unit variance.

Block design dataset
In this section, we employed the motor task 3T MRI raw block design dataset obtained from the quarter 3 release of the HCP 66,67 .An experiment was conducted for 204 secs to acquire this dataset to map the brain's motor cortex.
During the experiment, the subjects were instructed to tap their right or left fingers, pinch their right or left toes, or move their tongues in response to visual stimuli.Following a three-second visual cue, subjects underwent a specific movement task lasting 12 seconds.Ten movement tasks were considered consisting of two tongue motions, left/right finger, and left/right toe movements.As a result, there were a total of 13 blocks, including three Block design dataset dictionary/component learning For dictionary initialization, concatenated data, random numbers, and DCT bases were employed for CODL, ssBSS, and ACSD/swbDL/swsDL, respectively.The total number of iterations for all dictionary learning algorithms was set to 15 except for CODL and ssBSS, for which this number was set to 30.While performing dimensionality reduction, 100 components were kept from PCA, and 60 were retained when PCA was applied for the second time, and these many were extracted using cgICA and sgICA.These numbers and other parameters in this section were selected after trying their different combinations and considering that the selected ones must produce the best results in terms of correlation strength between the retrieved source and the ground truth.For sgICA, the sparsity parameter was set to 5 while the smoothing parameter was 50000.The total number of dictionary atoms to be trained using CODL was set to 70 with the sparsity parameter set to 6 with batch size equal to the data dimension.Using ACSD, 40 dictionary atoms were trained for each subject with a sparsity parameter set to 60.For ssBSS, 60 components were retained from PCA, and 40 were trained; its other parameters were set as 1 = 16 , ζ 1 = 50 , and K p = 60 .For both swbDL and swsDL, 40 atoms were trained, tuning parameters were set as 2 = 12 and α = 3000 for swbDL, and tuning parameters were set to ζ 2 = ζ 3 = 48 and 2 = 25 for swsDL.

Block design dataset results
In this section, the absence of activation maps for task-related components encouraged us to choose temporal analysis using six constructed MHRs.Similarly, the absence of temporal profiles for resting state networks motivated us to choose some of Smith's templates from R1-R10.The analysis was based on two strategies, i) subjectwise and ii) group-level.For subject-wise analysis, the TCs/SMs obtained by ACSD, ssBSS, swbDL, and swsDL for each subject were considered, whereas individual TCs/SMs for cgICA, sgICA, and CODL was obtained by back reconstruction.In contrast, for the group-level analysis, the group-level TCs/SMs obtained by all competing algorithms were used as a reference for further evaluation.Eventually, these TCs were correlated with the MHRs and SMs with RSNs, and the highest correlation values and the respective atoms/sparse codes were saved.Group-level correlation values are specified in Table 3, and the average correlation values over all subjects are mentioned in Table 4.The highest values in these tables have been highlighted in bold.

Resting state dataset
The resting-state dataset of all participating subjects was acquired from the first set of 3T MRI preprocessed S500 and S900 release of the HCP 66,67 .The resting-state dataset was obtained using acquisition parameters identical to the block design dataset.During the experiment, subjects were instructed to maintain fixation on a bright crosshair displayed in a darkened room, and 1200 scans were recorded twice in a single session for two different phase encoding directions.The second run, which featured left-to-right phase encoding, was considered for our study.Only 400 scans were kept for analysis, while the first 20 and the last 780 scans were discarded.The preprocessed resting-state data also went through spatial smoothing using 6 × 6 × 6 mm 3 FWHM Gaussian kernel, followed by temporal smoothing using DCT and temporal FWHM filter.The resting-sate dataset of eight subjects aged between 26 and 35 years was used in our analysis.
Resting state dataset dictionary/component learning Similar to synthetic and block design datasets, concatenated data, random numbers, and DCT bases were deployed for CODL, ssBSS, and ACSD/swbDL/swsDL, respectively, while initializing their respective dictionaries.Both CODL and ssBSS were iterated for 30 iterations due to their slow convergence, whereas ACSD/swbDL/ swsDL were run for 15.For cgICA and sgICA, 100 components were retained using PCA, followed by keeping/ extracting 50 components for the second PCA and ICA/sICA algorithm.Similar to the last two datasets, this section's parameter selection also depended on the correlation strength between the recovered sources and the ground truth.The sparsity parameter was set to 5 and the smoothing parameter to 30000 for sgICA.The number of dictionary atoms for CODL was set to 70, the sparsity parameter to 6, and the batch size equal to the data dimension.For ACSD, 40 dictionary atoms were trained with sparsity parameters set to 30 for each subject.For ssBSS, 60 components were preserved from PCA, and 40 were trained; its tuning parameters were set as 1 = 10 , ζ 1 = 90 , and K p = 150 .For both swbDL and swsDL, 40 atoms were trained, tuning parameters were set as 2 = 10 and α = 500 for swbDL, and tuning parameters were set to ζ 2 = ζ 3 = 80 and 2 = 50 for swsDL.

Resting state dataset results
Due to the absence of task-related components, our analysis in this section was based solely on Smith's resting state templates 64 .Similar to the previous two datasets, this study was also conducted for two different scenarios (i) subject-wise and (ii) group-wise.For subject-wise analysis, the SMs obtained using subject-wise ACSD, ssBSS, swbDL, and swsDL were taken into account, and for cgICA, sgICA, and CODL group-level SMs were considered to back reconstruct individual SMs.In contrast, for the group-level analysis, the shared SMs obtained by all competing algorithms were used as a reference for further evaluation.Eventually, these SMs were correlated with the RSN templates, the highest correlation values, and the respective atoms/sparse codes were saved.Group-level correlation values, along with their mean, are specified in Table 5 where the highest values have been highlighted in bold.

Discussion
There were quite a few activation maps and temporal dynamics for the block design dataset, but only a few have been shown to avoid increasing the paper length.Spatial maps for left and right finger tapping tasks recovered by five algorithms for subject 4 are given in Fig. 8.A series of 2D images assembled to render 3D volume for left/right toe pinching and left/right finger tapping group-level tasks are shown in Fig. 9.The common temporal dynamics for visual cue and tongue and their MHR are plotted in Fig. 10.Tables 3 and 4 show that the proposed swsDL algorithm overall outperforms all other algorithms by yielding atoms/sparse codes having the highest correlation with the ground truth, while swbDL being the second-best for both group-level and individual analysis.From spatial maps in Figs. 8 and 9, it is pretty evident that the activations revealed by ACSD, swbDL, and swsDL are more specific to the motor area compared to the rest of the algorithms.Nevertheless, group-level maps revealed by swsDL are comparatively less specific and more sensitive.
Only some results from the resting-state analysis have been shown here.For instance, (i) TCs obtained for the first five subjects using each of the five competing algorithms for medial visual and frontoparietal left network are shown in Fig. 11, (ii) the SMs for occipital pole visual and default mode network for subject 1 are shown in Figure 12, and (iii) group-level SMs for medial visual, occipital pole visual, lateral visual and default mode network are shown in Fig. 13.From Table 5, it can be concluded that overall, swsDL triumphed over all other algorithms in terms of correlation values.This is also visually supported by Figs. 12 and 13, where the spatial maps by swsDL appear more specific than maps by other algorithms.
For block design and resting-state datasets, Fig. 14 shows the convergence rate and correlation strength accumulation of ACSD, ssBSS, swbDL, and swsDL algorithms, and computational and source retrieval performance of all competing algorithms.The time values have been normalized to show correlation values and computation time in the same graph.It is also worth mentioning that the computation time is the mean of the time consumed by all three datasets, and the source recovery strength is the mean of subject-wise and group-wise correlation values for all three datasets.
Figure 14a shows that the ssBSS and ACSD algorithm and their proposed variants consistently converged over all iterations; however, this uniformity was less evident for swbDL.Figure 14b shows the steady development of correlation strength over all iterations for ssBSS, ACSD, and swsDL.In contrast, this progression seems to have stagnated for swbDL after the first few iterations.From these two subfigures, it becomes evident that compared to swbDL, both ACSD and swsDL gained from the increasing number of iterations.Figure 14c shows that although swsDL has outperformed all its predecessors, its computational performance is not very impressive.In contrast, swbDL has emerged as a runner-up in source recovery performance while having a relatively low  3, the corresponding correlation values are listed.numerical burden.This characteristic makes swbDL more favorable over swsDL/ACSD/ssBSS for FPGA-based implementation of DL algorithms in the future 70,71 .

Conclusion
This paper has presented two new dictionary learning algorithms, swbDL, and swsDL, explicitly designed for subject-wise and group-wise analysis.Unlike the conventional group analysis, the proposed algorithms' main advantage lies in their applicability to both task-related and resting-state fMRI data.Their efficacy has been illustrated using synthetic and experimental fMRI datasets, where their performance was found to be robust  and unwavering across experiments.The computational simplicity of swbDL associated with lesser calls to data and lesser arithmetic operations makes it more favorable over all other DL algorithms.Both of the proposed algorithms are promising alternatives to ACSD, ShSSDL, and sgBACES algorithms.Reducing computational complexity associated with swsDL its extension to hardware realization will be carried out in the future.
The strategy adopted for the proposed algorithms, where dictionary atoms and sparse codes are trained using the base spatiotemporal dynamics, is unprecedented.This approach allows incorporating similar components from the reduced-dimension space across subjects resulting in enhanced statistical power attained due to spatiotemporal variability offered by multi-subject data.The proposed model that considers this strategy through the training of representation matrices and base/dictionary sparse code is computationally intensive to solve.However, by exploiting the fast ssBSS method, block-wise update, and some performance compromise, a computationally efficient solution was reached via swbDL.On the other hand, an iterative approach was pursued using sequential learning that yielded higher source recovery precision at the cost of greater learning time.The convergence of both algorithms was guaranteed due to the sustenance of finite basis injective property and a strict sparsity pattern 72 .

Data and code availability
The experimental fMRI datasets used in this study are open-access and shared on the human connectome website.The Matlab code implemented for this study will be available from the corresponding author upon reasonable request.
specific dynamics is obtained by considering the subject-level residuals R m = Y m − D c X c and error matrix for all signals E m,k = R m − K m i=1,i� =k d m,i x i m as Whereas the k-th atom/sparse code update for the subject-specific dynamics is obtained by considering the common-level residuals R c = 1/M M m=1 Y m − D m X m and error matrix E c,k = R c − K c i=1,i� =k d c,i x sub.to.�d k � 2 = 1 {d k , x k } = arg min d k ,x k

Figure 2 .
Figure 2. A block diagram of the proposed framework where the ssBSS method extracts the base components and the rest of the blocks attempt to recover subject-wise and group-level TCs and SMs either block-wise or sequentially.

Figure 3 .
Figure 3.The top two rows show all spatial sources with respect to the first subject, the location and shape of each of the twelve spatial sources, and their variability across subjects.In contrast, the bottom two rows show the respective temporal sources where TC 7 and TC 8 have four unique patterns.

Figure 4 .
Figure 4. Using five different values of the spread parameter ρ , moderate to substantial spatial overlaps were created by controlling the size of twelve different activation blobs.

Figure 6 .
Figure 6.Over all trials, spatial overlap cases, noise realizations, and subjects, the mean of the (a) convergence rate and (b) correlation values between the ground truth and retrieved sources for subject-wise dictionary learning shown as a function of algorithm iterations.

Figure 7 .
Figure 7. (A) Ground truth TCs/SMs, and the recovered TCs/SMs by (B) sgICA, (C) ACSD, (D) swbDL, and (E) swsDL, along with the absolute temporal and spatial correlation values ( γ ) for each source, and the sum of these correlation values shown on the left.The highest correlation value for each source is shown in a different colour.
fixation blocks of 15 secs .Six modeled HRFs (MHRs) were created utilizing the canonical HRF and task stimuli associated with five different movement types: left toe (LT), left finger (LF), right toe (RT), right finger (RF), tongue (T), and visual type cue (VC) to acquire ground truth TCs.Each subject had their fMRI scans taken using a Siemens 3 Tesla (3T) scanner.The acquisition's specifications were echo time (TE) = 33.1 ms , TR = 0.72 secs , field of view (FOV) = 208 × 180 mm , flip angle (FA) = 52 o , matrix size = 104 × 90 , slice thickness = 2 mm with 72 contiguous slices, and 2 mm isotropic voxels, BW = 2290 Hz/Px , echo spacing = 0.58 ms , and 284 EPI volumes were collected where first 5 were considered dummy and discarded.The block design dataset of eight subjects aged between 22 and 35 years was used in our analysis.

Figure 8 .
Figure 8.The thresholded fourth subject's activation maps at a random field correction p < 0.001 extracted for the left and right finger movement tasks of the block design dataset using (a) sgICA, (b) CODL, (c) ACSD, (d) swbDL, and (e) swsDL, respectively.Table 4 provides the related averaged correlation values.

Figure 10 .
Figure 10.The most correlated group-level dictionary atom with MHR retrieved using (a) sgICA, (b) CODL, (c) ACSD, (d) swbDL, and (e) swsDL for the (A) visual cue, and (B) tongue movement task of the block design dataset.In Table3, the corresponding correlation values are listed.

Figure 11 .
Figure 11.The TCs for subject number 1-5 shown in sub-figures a-e respectively extracted using sgICA, CODL, ACSD, swbDL, and swsDL for the (A) medial visual, and (B) frontoparietal left networks of the resting state dataset.Table 5 contains the corresponding correlation values and their means.

Figure 14 .
Figure 14.As a function of algorithm iterations, the mean of the (a) convergence rate and (b) correlation values over all subjects for subject-wise dictionary learning, and (c) time consumption and source retrieval strength by all competing algorithms.

Table 3 .
For the block design dataset, correlation values of the most correlated group-level dictionary atom with six MHRs and most correlated common spatial maps with RSN templates obtained using seven competing algorithms, including the proposed.

Table 4 .
For six different block design tasks, correlation values of the most correlated back reconstructed component/atom with MHRs using cgICA/sgICA/CODL and averaged correlation values over the most correlated subject-level dictionary atom with MHRs obtained using ACSD/ssBSS/swbDL/swsDL.

Table 5 .
For resting-sate data, correlation values of the most correlated common spatial maps with RSN templates obtained using seven competing algorithms, including the proposed.